home *** CD-ROM | disk | FTP | other *** search
- A Description of the DOS File System
-
- by Philip J. Erdelsky
- PLAIN VANILLA CORPORATION
- CompuServe 75746,3411
- InterNet 75746.3411@compuserve.com
-
- January 15, 1993
-
- Copyright (c) 1993 Plain Vanilla Corporation
-
-
- 1. Introduction
- ---------------
-
- The DOS file system is used not only by IBM PC's and compatibles, but also by
- some standalone word processors that use diskettes and by a number of other
- devices. Even computer systems that normally use a different file system, such
- as UNIX-based workstations, also have the ability to read and write diskettes
- in DOS format. All of these uses together make the DOS file system the single
- most widely used file system in the world.
-
- The file system has evolved a great deal since its origin in DOS 1.0. The
- version described here is essentially that used by DOS 3.3. The large
- partitions used by DOS 4.0 and later versions of DOS are not included.
-
- This description was written in conjunction with version 2.0 of the Plain
- Vanilla Reentrant DOS-Compatible File System, an implementation of the basic
- DOS file system written in portable C and distributed as free software.
-
- There are a number of details regarding the DOS file system that are
- undocumented. Some of them have been filled in by experimentation; others are
- simply described as undefined;
-
- 2. Block Devices
- ----------------
-
- A DOS file system resides on what is generally called a block device. This is
- usually a disk of some kind, but it may also be a part of regular or extended
- memory that has been configured as a RAM disk.
-
- A block device stores information in sectors of equal size. The most common
- sector size is 512 bytes, which is used by all standard DOS floppy diskettes,
- but some RAM disks and hard disks may use sectors containing 128, 256 or 1024
- bytes.
-
- The sectors are numbered in a very simple way. Each sector has a logical sector
- number. The smallest logical sector number is zero, and the largest logical
- sector number is one less than the number of sectors. Every logical sector
- number between these limits is assigned to one other sector. There are no gaps
- in the numbering system.
-
-
- 3. Mapping Sides and Tracks
- ---------------------------
-
- The logical sector numbering system usually hides some complexities that must
- be handled by the disk controller hardware, the disk driver software or both.
- The sectors on a disk are arranged into concentric circles called tracks. Most
- floppy disks have tracks on both sides. A hard disk has more than two sides,
- because it consists of two or more disks, rotating in unison on the same
- spindle.
-
- Some disk controllers, especially hard disk controllers, accept logical sector
- numbers and make all conversions internally; but this is not always the case.
-
- To read or write a sector, the disk controller first moves a reading and
- writing device, called a head, back or forth until it is positioned over the
- track containing the desired sector. If the disk has tracks on more than one
- side, there is head for each side of the disk, and they move together. The
- disk controller selects the proper head by electronic switching. Finally, the
- disk controller waits until the disk rotation brings the desired sector under
- the head. Then it reads or writes the sector.
-
- To convert a logical sector number L into the corresponding side, track and
- sector numbers, the controller (or the software driver) usually needs the
- following three numbers:
-
- N = the number of sectors per track,
- T = the number of tracks per side, and
- S = the number of sides.
-
- The tracks are usually numbered from 0 to T-1, with track 0 on the outside and
- track T-1 on the inside. This puts the most frequently accessed sectors (those
- with low logical sector numbers) on the outside of the disk, where sectors are
- largest and the disk is the most reliable.
-
- The sides are usually numbered from 0 to S-1, in an order determined by the
- hardware design.
-
- The sectors in a single track are numbered from 1 to N, where the disk rotation
- carries them under a head in order of increasing sector number. The numbering
- starts with 1, not 0. The reason for this odd convention is not clear, but it
- is followed by a number of floppy disk drives, including the ones usually used
- with DOS-based computers.
-
- The assignment of logical sector numbers to sectors on a disk follows a simple
- scheme with a straightforward object -- to minimize access time when the
- sectors are read or written in order of ascending logical sector numbers. This
- usually means minimizing the number of times the heads must be moved, since
- this is by far the slowest kind of disk operation.
-
- The formulas are as follows, where A rem B means the remainder when A is
- divided by B:
-
- sector number = L rem N + 1
- side number = (L/N) rem S
- track number = (L/N) / S
-
- Logical sector number 0 is sector 1 in track 0 on side 0. Since this sector
- can always be located without knowing the disk structure, it is often used to
- store information about the disk structure. This is important for floppy disks,
- which are removable. Many floppy disk drives can accommodate disks with more
- than one format. The file system codecan read logical sector 0 to find out
- how to locate other sectors.
-
- As the logical sector number increases, the heads must be moved only when the
- track number increases. This does not happen until all sectors with a
- particular track number have been accessed.
-
- The number of tracks is needed only to check the track number to see whether it
- is in range. This is advisable in some cases because some disk drives may be
- damaged if an attempt is made to move the heads beyond their normal range.
-
-
- 4. Handling Bad Sectors
- -----------------------
-
- On any large hard disk, there are apt to be some bad spots. Some hard disk
- controllers can remap the bad spots and make the disk appear to be a perfect
- disk with a slightly smaller capacity. This is usually done during low-level
- formatting.
-
- However, floppy disk controllers and some hard disk controllers cannot do this.
- To handle most cases of this kind, the DOS file system can be set up so that
- some sectors are marked as bad. No DOS file information will be written into
- bad sectors. This lowers the disk capacity, but it makes imperfect disks
- usable. This is usually done when the disk is formatted.
-
- If there is a bad sector in a critical area of the disk, such as the part that
- indicates which sectors are bad, the disk cannot be used to hold a DOS file
- system. Fortunately, the critical area of a disk is quite small, so this is
- quite unlikely.
-
-
- 5. Disk Partitions
- ------------------
-
- The original DOS file system used 16-bit logical sector numbers, so it could
- handle only file systems with at most 65535 sectors, or 32 megabytes when
- 512-byte sectors were used. This was the infamous 32-megabyte barrier. DOS
- could use a disk with more than 65535 sectors, but it could use only 65535
- sectors for its file system.
-
- A simple solution to this problem is to divide the disk into two or more
- partitions, each of which contains no more than 65535 sectors, and to treat
- each partition as though it were a separate disk.
-
- DOS 4.00 and later versions of DOS can use 32-bit logical sector numbers, so
- they can handle larger partitions. This would seem to imply an 8-terabyte
- maximum, but other limitations make the practical maximum smaller than that.
-
- This document covers only partitions of 32 megabytes or less that use 16-bit
- logical sector numbers.
-
-
- 6. The Layout of a DOS Disk
- ---------------------------
-
- The sectors of a DOS disk are divided into the following groups, which are
- listed in order of increasing logical sector numbers:
-
- RESERVED SECTORS
- FILE ALLOCATION TABLES
- ROOT DIRECTORY
- DATA AREA
- HIDDEN SECTORS
-
- The groups are described in detail in the following sections.
-
-
- 7. Reserved Sectors
- -------------------
-
- The first reserved sector, which is always logical sector 0, is also called the
- bootstrap sector.
-
- Here is what is in a 512-byte bootstrap sector:
-
- byte(s) contents
- ------- -------------------------------------------------------
- 0-2 first instruction of bootstrap routine
- 3-10 OEM name
- 11-12 number of bytes per sector
- 13 number of sectors per cluster
- 14-15 number of reserved sectors
- 16 number of copies of the file allocation table
- 17-18 number of entries in root directory
- 19-20 total number of sectors
- 21 media descriptor byte
- 22-23 number of sectors in each copy of file allocation table
- 24-25 number of sectors per track
- 26-27 number of sides
- 28-29 number of hidden sectors
- 30-509 bootstrap routine and partition information
- 510 hexadecimal 55
- 511 hexadecimal AA
-
- Two-byte fields are always arranged in "little endian" order, with the less
- significant byte first. Notice that some two-byte fields are not aligned on
- even addresses. This may require special treatment by CPU's such as the
- Motorola 68000, which cannot directly access words at odd addresses.
-
- The first instruction of the bootstrap routine is always a jump instruction
- that transfers control to the bootstrap routine in bytes 30-509.
-
- The OEM name is usually the version of DOS or the name of the utility that was
- used to format the disk, represented by eight ASCII characters. It may even be
- completely blank. DOS apparently makes no use of the OEM name.
-
- The total number of reserved sectors, including the bootstrap sector, is
- specified by the contents of bytes 14 and 15. Reserved sectors other than the
- bootstrap sector are not used by DOS. Most DOS disks have only one reserved
- sector, but a larger number may be used to reserve space for a non-DOS
- partition or to cause DOS to avoid one or more bad sectors.
-
- For most 5-1/4 inch and 3-1/2 inch diskettes, the media descriptor byte conveys
- some the of same information redundantly:
-
- size 5-1/4 5-1/4 5-1/4 5-1/4 5-1/4 3-1/2 3-1/2
- density (D = double, H = high) D D D D H D H
- sides 1 1 2 2 2 2 2
- media descriptor byte FE FC FF FD F9 F9 F0
- bytes per sector 512 512 512 512 512 512 512
- sectors per cluster 1 1 2 2 1 2 1
- reserved sectors 1 1 1 1 1 1 1
- copies of file allocation table 2 2 2 2 2 2 2
- entries in root directory 64 64 112 112 224 112 224
- total sectors 320 360 640 720 2400 1440 2880
- sectors per file allocation table 1 2 1 2 7 3 9
- sectors per track 8 9 8 9 15 9 18
- hidden sectors 0 0 0 0 0 0 0
-
- In addition, the media descriptor byte is repeated in the file allocation
- table, which is described below.
-
- On diskettes formatted with only 8 sectors per track, the information given in
- bytes 3-29 is absent, and DOS must examine the media descriptor byte in the
- file allocation table to determine the exact disk format.
-
- What DOS will do with a diskette that contains nonstandard but consistent
- parameters in its bootstrap sector is undefined. In some cases it will read
- and write the diskette properly; in other cases it will not.
-
-
- 8. File Allocation Tables
- -------------------------
-
- The sectors immediately after the last reserved sector hold one or more copies
- of the file allocation table (sometimes called the FAT). The number of copies
- is specified by byte 16 in the bootstrap sector, and the number of sectors in
- each copy is specified by bytes 22 and 23 in the bootstrap sector.
-
- DOS uses only the first copy of the file allocation table, but it updates other
- copies, if any, to keep them identical to the first copy. Then if the first
- copy develops a bad sector, a special file recovery program can be used to
- retrieve the information from another copy.
-
- Most floppy and hard disks contain two copies of the file allocation table.
- Most RAM disks contain only one because RAM is a much more reliable medium.
-
- Versions of DOS before 3.00 used a file allocation table with 12-bit entries.
- Each group of three consecutive bytes contains two 12-bit entries, arranged as
- follows:
-
- The first byte contains the eight least significant bits of the first
- entry.
-
- The four least significant bits of the second byte contain the four most
- significant bits of the first entry.
-
- The four most significant bits of the second byte contain the four least
- significant bits of the second entry.
-
- The third byte contains the eight most significant bits of the second
- entry.
-
- In other words, if UV, WX and YZ are the hexadecimal representations of the
- three consecutive bytes, then the entries are WUV and YZX, respectively.
-
- This odd arrangement is actually quite easy to implement in 8086 assembly
- language. Yes, a file allocation table entry can be split between sectors.
-
- A file allocation table with 12-bit entries is suitable for floppy diskettes,
- but proved inadequate for hard disk partitions, even those within the
- 32-megabyte limit. With DOS 3.0, an alternative format with 16-bit entries was
- introduced. A 16-bit entry occupies two consecutive bytes, stored in "little
- endian" order, with the less significant byte first.
-
- The file allocation table entries are numbered consecutively. Entry 0 (the
- first entry) contains the media descriptor byte (byte 21 in the bootstrap
- sector), padded at the more significant end with ones to make it a 12-bit or
- 16-bit value. Entry 1 contains either 12 or 16 ones.
-
- For file allocation purposes, the data area of the disk is divided into a
- number of clusters. Each cluster is the same size, and consists of the number
- of consecutive sectors specified in byte 13 of the bootstrap sector.
-
- The clusters are numbered consecutively, starting with cluster number 2, which
- lies at the beginning of the data area.
-
- Each entry in the file allocation table, except entries 0 and 1, is associated
- with the cluster with the same number as the entry. This is quite consistent,
- because there are no clusters with number 0 or 1.
-
- The contents of each such entry are interpreted as follows:
-
- contents in hexadecimal
- 12-bit 16-bit meaning
- ------- --------- ---------------------------------------------------
- 000 0000 cluster is unassigned and available
- 001 0001 invalid entry
- 002-FEF 0002-FFEF cluster is assigned, contents is the number of
- the next cluster in the same file or subdirectory
- FF0-FF6 FFF0-FFF6 reserved
- FF7 FFF7 cluster contains a bad sector
- FF8-FFF FFF8-FFFF cluster is the last cluster assigned to a file
- or subdirectory
-
- Whether 12-bit or 16-bit entries are used depends on the number of clusters in
- the data area. If the range from 002 to FEF is sufficient to number them all,
- 12-bit entries are used. Otherwise, 16-bit entries are used.
-
- Notice that the restrictions on cluster size and the number of clusters put an
- upper limit on the partition size.
-
- Each subdirectory, and each file that contains at least one byte of data, is
- associated with a chain of file allocation table entries, one for each cluster.
- The directory entry for the file or subdirectory, which is described below,
- contains the number of the first cluster. The entry for each cluster other
- than the last cluster contains the number of the next following cluster. The
- entry for the last cluster contains a value in the range FF8-FFF or FFF8-FFFF,
- which marks the end of the chain.
-
-
- 9. Root Directory
- -----------------
-
- The root directory contains one 32-byte entry for each file in the root
- directory of the disk. If the disk has a volume label, the root directory also
- contains a 32-byte entry for that.
-
- The maximum number of root directory entries is specified in bytes 17 and 18 of
- the bootstrap block. This number, together with the sector size, determines
- the number of sectors allocated to the root directory.
-
- The field in bytes 17 and 18 of the bootstrap sector are always chosen so the
- number of root directory sectors is a whole number. What DOS will do otherwise
- is undefined.
-
-
- 10. The Data Area and Hidden Sectors
- ------------------------------------
-
- The total number of sectors given in bytes 19 and 20 of the bootstrap sector
- includes all reserved sectors and all sectors in the root directory, the data
- area and all copies of the file allocation table. It does not include the
- hidden sectors. The size of the data area is not given directly, but it can be
- calculated from this information. Apparently it must be an exact multiple of
- the cluster size. What DOS will do otherwise is undefined.
-
- The total number of sectors, plus the number hidden sectors given in bytes
- 28-29 of the bootstrap sectors, is the true total number of sectors in the
- partition. This is equal to the product of the number of sectors per track,
- the number of tracks per side, and the number of sides. The number of tracks
- per side is not given, but it can be easily calculated from this information.
-
- Usually, the hidden sectors are simply a few leftover sectors that could not be
- included in the data area because of various format constraints, but a large
- number of hidden sectors could be used to conceal a second partition.
-
-
- 11. Directory Structure
- -----------------------
-
- Each directory entry, whether in the root directory or in a subdirectory,
- contains the following fields:
-
- byte(s) contents
- ------- ----------------------------------------------------
- 0-7 file name or first 8 characters of volume name
- 8-10 file extension or last 3 characters of volume name
- 11 attribute byte
- 12-21 unused
- 22-23 time
- 24-25 date
- 26-27 number of first cluster
- 28-31 number of bytes in file, or zero for subdirectory or
- volume label
-
- The bits in the attribute byte determine the type of entry as follows (bit 7 is
- the most significant bit):
-
- bit meaning if bit = 1
- --- ---------------------------------------
- 7 unused
- 6 unused
- 5 file has been changed since last backup
- 4 entry represents a subdirectory
- 3 entry represents a volume label
- 2 system file
- 1 hidden file
- 0 read-only
-
- If the entry represents a subdirectory, bit 4 is set and all other bits are
- unused. If the entry represents a volume label, bit 3 is set and other bits are
- unused.
-
- Since DOS file names are case-insensitive, the file name and extension contain
- no lower-case (small) letters. They are always converted to the corresponding
- capital letters before writing them to the disk. What DOS will do if it finds
- small letters in a file name or extension in a directory entry is undefined.
-
- The file name and extension are padded with ASCII blanks (hexadecimal 20) at
- the end if necessary to fill their respective fields.
-
- Subdirectory names and extensions follow the same rules.
-
- A volume name, however, is case-sensitive, and small letters are permitted.
- The name is always 11 characters long, and is apparently also padded with
- blanks to fill out its field.
-
- When a subdirectory is created, or when a file or volume label is created or
- modified, DOS writes the current time and date into the time and date fields.
-
- The time field is a two-byte value, stored in "little endian" order, with the
- less significant byte first. Here is its format (bit 15 is the most
- significant bit):
-
- bits contents
- ----- ---------------------
- 15-11 hour (0-23)
- 10-5 minute (0-59)
- 4-0 double seconds (0-29)
-
- Since there are not enough bits in the time field to express the time to the
- nearest second, the time is first rounded or truncated to the nearest even
- second. The resulting seconds field is divided by 2 and written into bits 4-0.
- (Whether DOS actually uses rounding or truncation undefined, but it scarcely
- matters.)
-
- The date field is a two-byte value, stored in "little endian" order, with the
- less significant byte first. Here is its format (bit 15 is the most
- significant bit):
-
- bits contents
- ----- -----------------------------------------------
- 15-9 years elapsed since 1980 (0-127)
- 8-5 month (1=January, 2=February, ..., 12=December)
- 4-0 day (1-31)
-
- Hence the DOS world began on January 1, 1980 at 00:00:00 and will end on
- December 31, 2107 at 23:59:58.
-
- The very first byte of a directory entry has another function. If it is
- hexadecimal E5, the entry represents a deleted file, subdirectory or volume
- label. All other parts of the entry are valid and represent the status of the
- entry just before it was deleted. If the first byte is zero (not an ASCII
- zero, which has the value hexdecimal 30, but an ASCII NUL), the entry is a
- terminating entry. It has never been used since the directory was created, and
- there are no entries after it in the same directory.
-
- When DOS 3.0 came along, characters from the IBM extended character set were
- permitted in file names and extensions. Unfortunately, this created an
- ambiguity for hexadecimal E5, which represents an IBM extended character and
- also represents a deleted entry. For compatiblity between DOS 3.0 and disks
- formatted under previous versions of DOS, DOS 3.0 converts the extended
- character represented by hexadecimal E5 into 05 before writing it to the
- directory entry when it appears as the first character of a file name.
- Presumably, DOS gives the same treatment to the first character of a volume
- label.
-
- If a file contains no data, bytes 28-31 (number of bytes in file) are zero,
- which is quite logical. However, bytes 26 and 27 (number of first cluster) are
- also zero, although FFF8 would probably have been more logical.
-
- Bytes 28-31 (number of bytes in file) for a subdirectory entry are always zero.
- The length of a subdirectory is not recorded in its directory entry. However,
- a subdirectory always contains at least one cluster. Its length can be
- determined only by examining the chain of file allocation table entries.
-
- The first two directory entries in a subdirectory are always the special
- subdirectories "." and "..", respectively. Their time and date fields are the
- same as those of their parent (the subdirectory in which they lie). The first
- cluster number of the subdirectory "." is also the same as that of its parent.
- The first cluster number of the subdirectory ".." is the same as that of the
- parent of its parent, or zero if the parent or its parent is the root
- directory.
-
- Unused fields in a directory entry are set to zeros. What DOS will do
- otherwise is undefined.
-
- Also undefined is what DOS will do if it finds a volume label in a
- subdirectory, or if if fails to find the special subdirectories "." and "..".
-
-
- 12. Directory Protocol
- ----------------------
-
- When a disk is formatted, every entry in the root directory is a terminating
- entry.
-
- When a subdirectory is created, it is immediately given one cluster. (If there
- are no free clusters, a subdirectory cannot be created and the operation
- fails.) The first two entries are filled with appropriate information for the
- special subdirectories "." and "..". All other entries become terminating
- entries.
-
- There cannot be two entries in the same directory with same name and extension
- unless they are deleted entries or one is a volume label and the other is not a
- volume label. If the creation of a new file or subdirectory would cause a
- conflict, DOS usually fails the operation. However, if a new file is created
- with the same name and extension as an existing file in the same directory, and
- the file mode permits, the new entry will simply replace the old one. A new
- volume label in the root directory simply replaces the old one, if any.
-
- When an entry is deleted, its first byte is replaced by hexedecimal E5 to mark
- it as deleted. The rest of the entry is left unchanged. If the entry
- represented a file or subdirectory, all clusters associated with it are marked
- as empty in the file allocation table. The data in the clusters is not erased.
- Other entries in the directory are not moved to fill the gap. In fact, so much
- information is left intact that a special utility may be able to restore almost
- all of the deleted file, subdirectory or volume label if it is invoked
- immediately, before anything else is written to the disk. The first character
- of the name is truly lost, but other information remains on the disk and can be
- restored if it can be accurately located.
-
- When a new entry is added to a directory, it is put in the first available
- location. If the directory contains any deleted entries, the new entry
- replaces the first deleted entry. Otherwise, the new entry replaces the first
- terminating entry.
-
- If the directory contains no deleted or terminating entry, what happens next
- depends on whether it is the root directory or a subdirectory. If it is the
- root directory, the operation fails because the root directory cannot be
- lengthened. Otherwise, the directory is lengthened by adding another cluster
- to its allocation chain in the the file allocation table. If there are no free
- clusters, the operation fails. Otherwise, the new entry is the first entry
- in the new cluster, and all other entries in the cluster become terminating
- entries.
-
- After a number of insertions and deletions, a directory can become quite
- fragmented. The directory search time can be increased substantially if there
- are a lot of deleted entries, especially near the beginning of the directory.
- Special utilities are available to clean up fragmented directories, although
- they may make it difficult or even impossible to restore deleted files or
- subdirectories.
-
-
-